Large-Scale Structure in the Distribution of Galaxies as a Probe of 

Cosmological Models 

Michael A. Strauss*'^, Princeton University Observatory, Princeton, NJ 08544 

(strauss@astro.princeton.edu) 

February 5, 2008 

OO ■ 

o\ • 

On i 

The last 20 years have seen an explosion in our understanding of the large-scale distribution 
and motions of galaxies in the nearby universe. The field has moved from a largely qualitative, 
morphological description of the structures seen in the galaxy distribution, to a rich and increasingly 
rigorous statistical description, which allows us to constrain cosmological models. New surveys just 
Sow getting underway will be unprecedented in their uniformity and volume surveyed. The study 
of |the evolution of large-scale structure with time is now becoming feasible. 
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Introduction 



^^1970, Alan Sandage wrote an articleS describing observational cosmology as a "search for two numbers", 
{j^mely the Hubble Constant Hn. which sets the overall scale of the universe, and the acceleration parameter qo, 
^^ich measures its curvature @' B. Although the values of these two parameters are still very much a matter of 
gnijLtentiono, the field of observational cosmology has broadened considerably since then, as we have become aware 
o^hhe richness of the information encoded in the large-scale distoihution of galaxies. The observed distribution 
oOgalaxies on the sky shows hints of structure on large scales B 0, but without distance information to each 
i»3ividual galaxy, one is only seeing the galaxy distribution in projection. However, Hubble's lawS states that 
dj^e to the expansion of the universe, the redshift of a galaxy cz is proportional to its distance r: cz = Hqt. 
Xh*us the measurement of the redshifts of galaxies allows their distances to be inferred, yielding the full three- 
dtrhensional distribution of galaxies. In the late 1970's and early 1980's, improvements in spectrographs and 
ttector technology allowed redshifts of large numbers Ji f gala xies to be measured efficiently and quickly, and the 



flfSt large redshift surveys of galaxies were carried outB & 113' HU> 113, many of them at Kitt Peak. 

These early redshift surveys showed large voids and walls in the galaxy distributiono liir Ha , but the richness 
of the structure present really only hit home for the majority of astronomers with the publication of the first 



declination slice of the Center for Astrophysics redshift surveyU-§ (Figure 1). Galaxies are distributed on the 
walls of huge voids, as large as 100 Megaparsec (Mpc, where we assume a Hubble constant of 100 km s _1 Mpc -1 
throughout) in extent. A coherent structure is seen spanning the entire map ("The Great Wall" El), causing us to 
ask whether yet larger structures would become apparent with a redshift survey covering a yet larger volumell^'. 
Figure 1 is a two-dimensional slice through the galaxy distribution (albeit not suffering from the projection effects 
seen in the distribution of galaxies on the sky without redshifts). The extension of this survey in the third 



dimensionlia has shown that galaxies are indeed roughly distributed on the walls of spherical voids, as Figure 



1 was originally interpreted, although the galaxy distribution is somewhat more filamentary than this mental 
picture suggestsOjJ. This has been confirmed further with the Las Campanas Redshift SurveyEll; with over 
20,000 galaxies, it is the largest single redshift survey to date. 
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In any case, a map like this broadens the range of questions which we want to ask in the field of observational 
cosmology. In addition to the values of Hq and qo, we would like to know: 



What is the topology of the galaxy distribution on various scales? What are the largest coherent structures 
that exist in the galaxy distribution? The Cosmological Principled states that the universe is homogeneous 
and isotropic on the largest scales; is this indeed observed? 

How did this structure form in the first place? How might we constrain the parameters describing the 
expanding universe, and models for structure formation with these data? What is the connection to the 
formation of galaxies themselves? And how does the inhomogeneous structure relate to the peculiar velocities, 
galaxy motions above and beyond the Hubble flow? 
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• Galaxies and clusters of galaxies are gravitationally dominated by dark matterll^f ; indeed, only of order 1% 
of the matter in the universe is directly visible. What is the distribution of the galaxies relative to the dark 
matter; are the galaxies a biased tracer of the mass distribution of the universe? What is the nature of the 
dark matter which dominates the gravity, and therefore the dynamics, of the universe? 

We can now begin to address these questions with observations of the distribution and motions of nearby 
galaxies, but we are very much limited by the non-uniformity and finite volume probed by existing datasets. 
As we describe below, we do most of our work in the context of a standard model whereby structure forms via 
gravitational instability from tiny initial fluctuations, but with present data, this paradigm is not properly tested, 
and its parameters are only poorly known. New surveys now getting underway, both at low and high redshift, 
should allow us to address all of these questions in much more quantitative detail than has been possible in the 
past. 



2 Quantifying the Distribution of Galaxies 

The rms fluctuations of the observed galaxy density field are very large on small scales, of order unity within 
spheres of radius 8 Mpc, dropping as a power law with scale, becoming a few percent at several tens of Mpc. It 
therefore makes sense to reference the density field of galaxies to its mean. Let p(r) be the observed galaxy density 
field smoothed with some uniform kernel; the density fluctuation field is defined as 5(r) = (p(r) — (p))/(p). This 
quantity can be measured from flux-limited redshift surveys, after suitable accounting for the fall-off in observed 
galaxy number density with distance due to the flux limit ell. Figure 2 shows the observed galaxy distribution of a 
redshift survey of galaxies!?-!! from the database of the Infrared Astronomical Satellite (IRAS), in the Supergalactic 
Plane, in which many of the more prominent structures in the Local Universe lie. 
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We interpret these observations in the context of the prevailing Big-Bang model of cosmology 
Observations of the Cosmic Microwave Background (CMB)Hll show that at a redshift z ~ 1100, the universe 
showed deviations from uniformity at one part in 10 5 on scales of several hundred Mpc@l . In standard inflationary 
models for the Big Bangir3 , these initial fluctuations have a Gaussian one-point distribution, and the Fourier modes 
into which they might be decomposed have random phases. To the extent that this is true, the power spectrum 
of the density field, the mean square amplitude of the Fourier mode as a function of wavenumber k, is a complete 
statistical description of the fluctuations ^3. The process of gravitational instability gives rise to the structures 
that we see, and indeed the numbers work out; a priori predictions of the amplitude of CMB fluctuations from the 



observed distribution of galaxiesc§l, which involve a large extrapolation both in time and in spatial scale, were 
indeed of the order of 10 -5 , as observed. 

In linear perturbation theory, while the amplitude of the power spectrum grows with time, the shape of the 
power spectrum does not evolve. Therefore the measurement of the present-day power spectrum allows us to 
determine the shape of the initial power spectrurm. which encodes in it information on the Cosmological Density 
Parameter Q and the nature of the dark matter l=SL Measurements of the present-day galaxy power spectrum, 
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and its Fourier Transform, the correlation function, from both angular and redshift surveys of galaxiesc^ 1 have 
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been instrumental in de mon strating that the standard $7 = 1 Cold Dark Matter model overpredicts the observed 
power on small scales!^' till, thereby ruling out this once-standard model. However, the quantitative agreement 
between the power spectra found from different redshift surveys is not particularly good, and their interpretation 
in terms of theoretical initial power spectra has been difficult. The two principal reasons for this difficulty are the 
presence of peculiar velocities, and our ignorance of the relative distribution of galaxies and dark matter. 

Look again at Figure 1. As described in the figure caption, the redshift distribution of galaxies is distorted by 
peculiar velocities. The radial component of these peculiar velocities v modify Hubble's law, 

cz = H r + f • [v(r) - v(0)] , (1) 

where v(0) is the peculiar velocity of the Local Group. The net effect is to take a dense concentration of galaxies 
and spread it out in redshift space; the clustering therefore appears weaker than it would in real space. Thus on 
the relatively small scales of clusters, peculiar velocities cause the power spectrum to be underestimated. 

On large scales, the opposite effect happens!!*!. A large overdense region of space gravitationally attracts the 
galaxies that lie near it, giving them peculiar velocities that make them appear closer to the overdensity in redshift 
space than in real space. The net effect is that the structure appears compressed, and therefore of higher density, 
in redshift space than it really is. This effect can be measured directly via the anisotropy this effect induces in 



the clustering, although current measurements are very much limited by the finite volumes surveyed thus far el. 
Practical methods are needed to fit redshift survey data for the underlying power spectrum, accounting for peculiar 
velocities both in the linear and non-linear regime. 

Even more vexing a problem is the relative distribution of galaxies and dark matter. Astronomers have 
long parameterized their ignorance of this issue by hypothesizing a direct proportionality between the density 
fluctuations of galaxies 5 g and dark matter 5dm, b~g = bS^M, where the biasing parameter b is independent of 
location or smoothing scalelU'IHl. Although there are plausibility arguments that this might be the caseS', it 
is quite difficult to test this directly. Cosmological simulations show that the locations at which galaxies form 
depend on various physical quantities in addition to the local dark matter density. Thus the relationship between 
the galaxy and mass distribution is both non-linear and stochastic, and differs from one sample of galaxies to 
another. People are only now starting to investigate how this propagates to cosmological inferences drawn from 



the observed distribution of galaxies E§L 

One important way to get a handle on this is to measure the large-scale distribution of galaxies of different 
types. The cores of rich clusters of galaxies are almost entirely composed of elliptical and lenticular galaxies Zl HH, 
while these two populations represent only 20% of galaxies in the field. If the distribution of different types of 
galaxies do not agree with one other, they cannot all agree with the distribution of dark matter! On scales larger 
than clusters, it is known that there are not gross differences in the relative distribution of galaxies of different 



types or different luminosities Si , although the clustering strength does depend on these quantities' — ' —' @ 
These analyses are limited by the small samples and poor morphological information available; with larger and 
better samples, we might hope to learn much more about the nature of bias, which, as hinted at above, may teach 
us about the process of galaxy formation. 

Even if the initial density field is perfectly Gaussian, and thus can be completely described by the power 
spectrum, gravitational instability theory predicts that as the fluctuations grow in strength, the distribution of 5 
must develop a skewness, whose value can be calculated to leading order in perturbation theoryeJl. In addition, 
there are classes of models in which large-scale clustering is seeded by exotic initial structures such as textures @, 
which imprint non-Gaussian features in the initial density field. Thus measurements of the non-Gaussianity of 
the galaxy distribution allow one to test gravitational instability theory, and to look for signatures of these seeds. 
Alternatively, the topology of the galaxy distribution is a measure of the correlation of the relative phases of Fourier 
modes. One measures the genus number of surfaces of constant galaxy density e3 ; comparison with the prediction 
in the random phase clean measure of non-Gaussianity. Measurements of high-order correlations from 

redshift surveysE^' E3l and angular catalogsE3 are in impressive accord with gravitational instability predictions. 
Similarly, topology measurements 01 show the qualitative effects expected from non-linear growth of fluctuations 
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from Gaussian initial conditions. Indeed, there is as yet no convincing evidence for initial non-Gaussian seeding 
of the density field. However, existing data are not powerful enough to make these constraints very strong. 



3 Peculiar Velocities 

We have seen that large-scale inhomogeneities in the galaxy distribution should give rise to peculiar velocities. 
In linear perturbation theory, there is a linear relation between the density and velocity fields: V • v(r) = 
— ffo^°' 6 ^DAf(r)^ • The Hubble Constant Hq is identically equal to unity if one measures the distances to galaxies 
in redshift units. Thus a comparison between the galaxy density distribution and the peculiar velocity field allows 
a measurement of the Cosmological Density Parameter f2. However, because peculiar velocities are caused by the 
gravitational attraction of all matter, while the density fluctuations we can measure directly are those of galaxies, 
we are affected by biasing in this comparison. In particular, for linear biasing, the relationship between the galaxy 
velocity and density fields becomes 

Vv(r) = -/3^(r), (2) 

where (3 = Q°- 6 /b. 

By far the best-measured peculiar velocity is our own; we measure a 0.1% dipole moment in the temperature 
of the CMBE3, which is interpreted as a Doppler effect due to the Sun's motion relative to the rest frame defined 
by all the material which radiates the CMB photons we detect. Correcting for the rotation of the Milky Way, and 
the infall of the Milky Way to the barycenter of the Local Group yields a peculiar velocity of 620 km s _1 towards 
Galactic coordinates I = 276°, b = +30°. Using an integral form of equation (g) allows this value to be compared 
with the prediction from a redshift survey; a detailed analysis© yields (3 = 0.55±g;f§. 

The radial component of the peculiar velocity for a galaxy follows immediately from independent measures of 
its distance and redshift (equation [l], where as before, we work in units in which Hq = 1). One determines the 
relative distances of galaxies using distance indicator relationship, whereby their rotation velocities (spirals) or 
internal velocity dispersions (ellipticals) are related to their optical luminosities; a measurement of their appar- 
ent brightnesses then yields their distances via the inverse square law. The telescopes at Kitt Peak have been 
instrumental for the development and refinement of these distance indicators, and indeed, much of the data in the 
analyses described below was obtained at Kitt Peak. 

These distance indicator relations show appreciable scatter, allowing the measurement of the distances of 
individual galaxies to 15-20% accuracy. For a galaxy at a distance of 40 Mpc, this is a 600 km s _1 error at best, 
which of course propagates directly into the inferred peculiar velocity. As this is of the order of the peculiar velocity 
itself, the signal-to-noise ratio in the inferred peculiar velocity field is less than one per galaxy. Moreover, the 
result^aresubject to substantial statistical biases depending on the details of how the distance indicator relation is 
usedE3' tiM. One therefore requires a heavy grouping or smoothing algorithm, or an elaborate statistical method, 
to extract useful information from such data. 



In order to trace out the full velocity field, we use peculiar velocity samples that cover much of the 
A number of papers'—' HI S have developed rigorous techniques for comparing peculiar velocity and redshift 
survey data via the integral form of equation (||) . These results indicate the likelihood of systematic effects at some 
level in the TF data, probably due to the difficulty of matching separate samples across the sky H; when these 
are corrected for, authors agree that the data are consistent with equation (Q), and that (3 = 0.5 — 0.6 (ref. []54| is 
the exception, with the smaller value of (3 = 0.34 ± 0.13). 

Alternatively, one can work from equation (^) in its differential formH3 . To the extent that the peculiar velocity 
field is irrotational (as expected under gravitational instability on all but the smallest, most non-linear scales), 
it can be expressed as the gradient of a scalar potential: v = — V$ w . This potential field can be determined 
by integrating the observed radial velocity field r • v(r) (after suitable smoothing) along radial rays. The full 
three-dimensional velocity field v(r) is the gradient of the resulting field, from which the divergence, or a suitable 
non- linear extension [HI, can be measured. The effective smoothing of the resulting map is set by the initial 
smoothing of the raw peculiar velocities; in practice, this smoothing is a 1200 km s _1 Gaussian. 
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A comparison of the resulting inferred density field with the IRAS galaxy density field shows beautiful 
consistency, as predicted from linear biasing and gravitational instability theory. This analysis finds (3 = 0.89±0.12, 
in strong disagreement with the analyses quoted thus far (some of which used essentially the same redshift survey 
and peculiar velocity data!). The reason for this disagreement is not yet clear. This may be due to subtle systematic 
effects in the data; alternatively, it is possible that our assumption of a scale- independent bias is incorrect, although 
a self-consistent model which explains all the data has not yet been developed. 

Although we do not have a direct measure of the value of the bias parameter b, current theoretical models 
indicate that is of the order of unity. Thus the results above translate into values of f2 around 0.3 and 1.0, 
respectively. The former value is close to what is inferred from studies of the abundances and evolution of clusters 
of galaxies!!]' US , and the observed shape of power spectrum (see the previous section) , while the latter is the 
"cleanest" prediction of inflationary models. The community has recently moved over towards the f2 ~ 0.3 camp for 
the most part; the analysis cited above giving (3 = 0.89 remains one of the strongest arguments for an appreciably 
larger value of Q. But the result will not be settled until it is understood why different analyses with the same 
data are giving such disparate results. 



4 The Future 

The various quantitative analyses of large-scale structure described above are limited by systematic effects, either 
due to non-uniformities in the data at some level, or the smallness of the volumes surveyed. For example, consider 
the measurement of the galaxy power spectrum on the largest scales. This is a crucial quantity, as it can be a 
strong constraint on cosmological models. Systematic errors in the photometry of galaxies from which a sample is 
selected for a redshift survey will mimic large-scale gradients in the density field, causing systematic errors in the 
inferred power spectrum. In addition, on scales approaching the scale of the sample, one by definition only has 
a small number of independent volumes on which to measure density fluctuations. Thus for any redshift survey, 
there will be a scale, comparable to the volume of the survey, for which the sample is not fair for measuring the 
power spectrum. 

These considerations motivate us to consi der the largest, most uniform possible survey of the galaxy distri- 



bution. The Sloan Digital Sky Surveyl^r is such a survey. Over the course of five years, a wide-field CCD 
camera on a dedicated wide-field 2.5m telescope will survey the entire Northern Galactic cap (1/4 of the Celestial 
Sphere) in 5 colors to a depth of r' = 23.1, creating a highly uniform digital catalog of over 10 8 galaxies and stars. 
The brightest million of these galaxies will be targets for a redshift survey, to be carried out by a pair of fiber-fed 
multi-object double spectrographs on the same telescope. The resulting database will be an order of magnitude 
increase in the volume probed by redshift surveys, and a factor of 40 increase in the number of galaxies in a single 
survey. More important, it will have a much tighter control of systematic errors than previous surveys. Finally, the 
detailed photometric data available for each galaxy will allow large-scale structure studies to be done as a function 
of galaxy morphological type, color, and luminosity, allowing us to quantify the relative bias, and to develop more 
complete models for the underlying dark matter density field than the simplistic linear bias assumption allows us. 

A competing surveyl@ uses a wide-field spectrograph attached to the Anglo-Australian Telescope to obtain 
redshifts for 250,000 galaxies selected from photographic plates. This survey has obtained redshifts for almost 
10,000 galaxies as of this writing. As their galaxy selection is based on shallower, single-color photographic plate 
material, their control on systematic effects will not be as great as in the Sloan survey. 

These two surveys will make definitive measurements of the nature of large-scale structure in the nearby 
universe. Analyses show that they should measure the power spectrum to an accuracy sufficient to measure a 
large host of cosmological parameters, especially when coupled with measurements of the CMBI3. 

A new frontier in the field is just now opening up; the study of large-scale structure at high redshift. The 
finite speed of light means that we see distant galaxies at a time when the universe was younger than it is today. 
Thus we can study directly the growth of large-scale structure with time, by measuring the distribution of distant 
galaxies. Indeed, gravitational instability theory makes definite predictions for the evolution of structure with 
redshift, which can be checked from such measurements. 
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One probe of the evolution of clustering has been via wide-field photometric su rvey s o f fa int galaxies. The 
angular correlations of galaxies can be measured directly from deep images of the skyES ElD> l§3 ; some of the best 
work in this field is being done at Kitt Peak. Indeed, techniques have been developed recently for measuring 
approximate redshifts of galaxies from their broadband colors^!, which allow the redshift dependence of the 
angular clustering to be measured directlyllPJ . The KPNO 4-m has been instrumental in obtaining the broadband 
photometric data one needs to calibrate and measure these photometric redshifts. 

With accurate spectroscopic redshifts of high-redshift samples, more detailed analyses are possible. Two 
Canadian and French groupsEU' IZ3 have carried out extensive redshift surveys of up to several thousand galaxies 
in clusters and the field, to a redshift of z ~ 1, and have been able to measure the evolution of the small-scale 
clustering of galaxies. Most dramatically, Steidel and colleagues^ have been able to identify galaxies likely to 
lie at z > 3 by their particularly red U — B colors due to Lyman continuum absorption redshifted to the U 
band (a crude form of photometric redshift); spectroscopic surveys of these extremely faint objects with the 10-m 
Keck telescopes on Mauna Kea have resulted in redshifts of literally hundreds of these galaxies. They have found 
astonishingly strong clustering at z ~ Gravitational instability theory predicts appreciably weaker clustering 

at high redshift for the dark matter; the interpretation, then, is that this early population of galaxies is much 
more strongly biased relative to the dark matter than are galaxies today. In retrospect, this indeed makes sense; 
one generically expects the bias factor to approach unity as galaxies are gravitationally drawn to the dark matter 
potential wells 0> \L3, but it tells us that the study of the evolution of large-scale structure will require a deep 
understanding of the evolution of bias, and therefore probably also of galaxy formation. New extensive redshift 
surveys of faint galaxies planned for the Keck telescope and other 8-10 meter class telescopes coming on line around 
the worldHH promise that insights to this problem will be coming soon; the bottleneck for this work, ironically, 
is the lack of deep imaging surveys of galaxies on wide-field 2.5-m class telescopes from which the objects for 
the redshift surveys can be selected. Kitt Peak is leading the world-wide effort to fill this gap (Jannuzi, private 
communication) . 

The field of large-scale structure is reaching a new point of maturity, which will allow us to address the big 
questions in the field outlined at the end of § |]. With the new surveys of the local universe which are just getting 
underway, we should have accurate measurements of a number of cosmological parameters, and a firm zeropointing 
of the present-day large-scale structure with which to compare the exciting results from observations of structure 
at high redshift. These surveys will be large enough to make a definitive measurement of the topology of the 
galaxy distribution on the largest scales. To the extent that we fit the data convincingly with a cosmological 
model without large numbers of ad-hoc parameters, we can claim to have a complete model for the formation of 
structure. Measurement of large-scale structure as traced by galaxies of different types will give us insight into 
the increasingly complicated problem of the relative distribution of galaxies and dark matter. A full picture will 
become clearer with observations of the evolution of galaxy clustering with redshift, together with a more thorough 
theoretical understanding of the evolution of bias. 

We will really only be able to claim that we have a coherent cosmological model from these data if the results 
are consistent with a variety of other cosmological probes which we have not had space to cover in this review. The 
CMB anisotropies give a complementary measurement of the mass power spectrum ell ; the Microwave Anisotropy 
Probe (MAP; see http://map.gsfc.nasa.gov), a satellite which will fly in Fall 2000, is designed to make such 
a measurement. Measurements of quasar absorption lines C3 probe the power spectrum on the smallest scales; 
recent worklZ3 has shown that the extraction of this information is quite straightforward. Large-scale structure 
can be probed via the distribution of clusters of galaxiesl^I, and as we mentioned briefly above, their mass 
function anxLevolution also gives^. sensitive measure of cosmological parameters; new large complete surveys in 
the opticallE?i and the X-ray*^ will give us an enormous increase in our understanding of these objects. 
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Figure 1: The distribution of galaxies in redshift space from the survey of ref. |4(J. The sample covers a narrow 
range of declination, so redshift is plotted against right ascension, with declination suppressed. The large elongated 
structure in the middle of the map is the Coma cluster of galaxies. In the deep potential well of the cluster, galaxies 
have random motions in addition to their motion due to universal expansion, with a velocity dispersion of order 
1000 km s _1 E3. Thus the redshift dia gram of a rich cluster is stretched out along the line of sight; the resulting 
structure is often referred to as a "Finger of God" , due to the fact that it points directly to the observer at the 
origin. 
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Figure 2: The distribution of galaxies in redshift space from the survey of ref. [21]. The left-hand panel shows 
those galaxies within 22.5° of the Supergalactic plane, while the right-hand panel shows the corresponding smooth 



density field, using the methods of ref. [59]. Labels show several of the prominent named structures, including the 
Virgo cluster (V), the Perseus-Pisces supercluster (P-P), the Great Attractor (GA), and the Sculptor Void (Sc). 



12 



